1 Exploratory Data Analysis

1.0.1 General repartition of sales

We can see that CA represents more than 40% of sales, while WI and TX are about the same.

We can see that the two stores with the most sales are in CA, which is not surprising given the previous plot. Our store, TX_1 is more or less in the average in terms of sales.

Now, let’s focus on our store, TX_1

1.0.2 Repartition of sales for TX1

We can see that food represents the vast majority of sales with more than 60%, followed by household. Hobbies represent less than 10% of sales.

Not surprisingly, this is one of the food departments that accounts for the vast majority of sales. Household_1 accounts for most of the sales of the household department.

We can see that the store make more sales during the weekend.

It seems that sales are slightly higher between February and June and lower between November and January.

All years seem similar, with slightly lower sales at the beginning and end of the year and higher sales between May and September. Sales are exceptionally high in July 2015. Note that not all products were on sale from January 2011, which is probably why 2011 does not follow the seasonality before August.

1.0.3 Evolution of prices

this plot is to see if there seems to be a trend in the applied prices. Average prices are lower at the beginning, but this is probably because not all products were sold by 2011.

In this case there doesn’t seem to be any particular trend.

We have selected product "FOODS_3_794 in particular to show the possible price variations over time. The three downward peaks are probably discounts or an error in the data.

1.0.4 Events

there’s a majority of national and religious events. But it would be interesting to know what events there are in particular during the 28 days that we have to predict.

if you look manually, the only events that take place during the 28 days to be predicted are: “NBA Finals”, “Independence day”, “Ramadan start” and “Father’s day”. In 2016, within that time frame, there was only Independence Day, which took place

term estimate std.error statistic p.value
(Intercept) 2925.6242 68.44472 42.744339 0.000000
event_name_1 == “IndependenceDay”TRUE 112.1758 379.85276 0.295314 0.768157

We can see that Independence Day does not have a significant impact on sales.

1.0.5 Time series graphics

When you look at sales by day, there seems to be an annual seasonality, and you can see the Christmas day when the store is closed each year. Sales are a bit lower at the beginning.

If we decompose the time series by category, the seasonality we observed seems to come only from FOOD. We always see sales at zero on Christmas Day. The lower sales at the beginning of the data that could be seen in the previous plot come only from HOUSEHOLD. This may be due to the fact that not all products were sold at the beginning of the data.

As we have daily data, we may have several seasonalities. We use the mstl function which allows us to make a decomposition between different seasonalities, the trend and the residuals. Here we see a weekly seasonality, but it seems that there is still a seasonality in the trend.

On the ACF with a lag of 180, a monthly seasonality can be identified.

When the lags are reduced, the 7-day seasonality of the data is clearly identified.

1.1 Conclusion of EDA

This EDA helps us to familiarize ourselves with the general data and more specifically with our TX_1 store. As far as sales are concerned, there is no particular trend, but slightly lower sales at the beginning of the observed period, probably due to the fact that not all products are on sale since the beginning. There is a weekly and a monthly seasonality. The problem with a monthly frequency is its lack of consistency, as not all months have the same number of days and using a frequency of 30 or 31 would create a bias. Thus, we will not take the monthly seasonnality into account in our models.

We see that the store is closed every year at Christmas, which is an important element that we will have to take into account in our models afterwards. It is not a seasonality but a repetitive event once a year.

Finally, the only event that has taken place during the 28 days that we have to predict is Independence Day, which takes place on the 4th of July each year. We will also have to take into account the influence of this event in our models.

2 Modeling

2.1 Preparing the data for analysis and general methodology

2.1.1 Putting our data into a time series

We were looking at two different frequencies: monthly and weekly. Each seemed to make sense when we used them to remove seasonality (see EDA).

However,

So we have 2 frequencies to compare: one of 365 and one of 7.

2.1.2 Divide the time series into training/validation/test sets:

To avoid any problem related to the frequency used when creating the time series, we decided to use brackets.

The data at our disposal contains 1969 days (lines); however, from line 1914, the data is only composed of NA.

We decided to remove 56 days (2x28) from the training set. This allows us to measure the predictions with a validation set of 28 days and still have a test set.

2.1.3 Choice of the metric

According to the paper Another look at measures of forecast accuracy, using the MASE would ensure us easily applicable metric among our methods.

Moreover, being the mean absolute error of the forecast values, divided by the mean absolute error of the in-sample one-step naive forecast, it allows us to use it as a benchmark.
Thus, a MASE with a value higher than one would mean that our model is less attractive than in-sample one-step forecasts from the naive method and thus should not be considered.

Therefore, we use the MASE to evaluate the next forecasts accuracies.

2.2 Hierarchical modeling

2.2.1 First tests

Having to analyze the TX1 store, we had started by trying to modelize the sales of this store using all the data.

We were soon confronted with the problems of the lack of power of our laptops.

However, we were not sure that the code and method used was the right one and to better understand how the hts and forecast functions work, we decided to conduct an analysis only on HOBBIES_2 which is the department with the fewest items in the store.

We only managed to make the following prediction: A model using the bottom-up arima method.

Top level : Hobbies_2
Bottom level : all items
28 days forcecast : metrics of an ARIMA model
Bottom-up
HOBBIES_2 HOB2010TX1 HOB2016TX1
ME -7.353 0.027 0.035
RMSE 12.955 0.427 0.189
MAE 11.359 0.354 0.036
MAPE 50.439 Inf Inf
MPE -42.279 -Inf -Inf
MASE 1.264 3.689 0.122

The model is not good, the MASE of HOBBIES_2 is higher than 1, however, using an arima for some articles could be a good option as we can see with the article: HOB2016TX1.But for others the model doesn’t work at all. Moreover, the problem remained the same: we won’t be able to compute more than 3,000 different models for each item sold at Walmart. If we had more time we could have taken each item separately to find the best possible model. We probably would have chosen an arima for HOB2016TX1 while we would have looked for another model for HOB2010TX1 before aggregating them all.

2.2.2 We no longer used the bottom level

In order to analyze all the sales of the store, an analysis stopping at the level of each department is more reasonable.

Thus we have modified our code to aggregate the sales of each item at the level of their respective department.

We computed 5 different models two with a bottom-up and top-down approach (ARIMA Model and Exponential Smoothing State Space Model). We add to the ARIMA model with bottom-up approach regressors corresponding to Christmas and Independance day.

Top level : TX1
Bottom level : dept_id
28 days forcecast : metrics of different models
Bottom-up
Top-down
Comb
ARIMA ETS ARIMA ETS mint - shr
ME 671.431 417.965 580.152 438.109 410.373
RMSE 1228.330 1508.708 1460.943 1514.347 1502.121
MAE 940.313 1163.264 1118.037 1160.847 1158.583
MAPE 10.000 12.555 11.983 12.495 12.513
MPE 6.293 2.351 4.565 2.590 2.270
MASE 0.785 0.972 0.934 0.970 0.968

The ARIMA model using a bottom-up approach gives the best results.

2.2.3 Let’s use the levels above

We realized using the top-down method before, that by limiting the top level to TX1 we were losing information that could be taken into account. Furthermore, using all the stores in Texas as well as all the stores in the US would not add so much extra model to compute for the computer.

Thus we have re-tried the models used previously.

Top level : all stores
Bottom level : dept_id
28 days forcecast accuracy: metrics of different models
Bottom-up
Top-down
Comb
ARIMA ETS ARIMA ETS mint - shr
ME 212.608 77.068 187.703 1265.745 -680.343
RMSE 919.526 1072.454 959.829 1681.506 1273.730
MAE 747.640 920.409 783.848 1318.651 1102.523
MAPE 8.191 10.200 8.632 13.571 13.059
MPE 1.341 -0.529 1.027 12.853 -9.082
MASE 0.721 0.887 0.756 1.271 1.063

These models give better results.

We had omitted an option that seemed to make no sense until then: the middle out. Indeed, combining a bottom-up method going from departments to TX1 stores and a top-down method going from all stores in the US to TX1 stores seemed promising.

Top level : all stores
Bottom level : dept_id
28 days forcecast accuracy: metrics of the models
Middle-out
ARIMA ETS
ME 295.082 156.670
RMSE 914.796 1081.349
MAE 747.125 917.643
MAPE 8.166 10.078
MPE 2.407 0.369
MASE 0.720 0.885
FORECAST_ALL_bu_arima %>% aggts(levels = 2)
## Time Series:
## Start = 1858 
## End = 1885 
## Frequency = 1 
##            CA1      CA2      CA3      CA4      TX1      TX2      TX3      WI1
## 1858 103696.11 26929.10 13648.00 7476.911 9587.075 11309.65 5122.647 4380.803
## 1859  86050.14 24313.87 10836.01 6459.310 8589.258 10679.97 4130.906 3399.896
## 1860  80159.48 23227.98  9991.79 6144.520 7908.932 10432.33 3663.817 3153.956
## 1861  84842.88 23041.83 10892.72 6194.616 7773.137 10395.03 3930.781 3284.990
## 1862  97779.06 25136.32 13439.16 6718.878 8289.824 11547.43 4664.284 4260.315
## 1863 110309.13 28030.84 15752.41 7608.252 9345.192 12495.48 5269.482 5289.967
## 1864 112319.15 28432.44 15705.07 7861.371 9705.399 12285.01 5345.011 5290.704
## 1865 102259.68 26663.78 13647.30 7367.153 9368.658 11629.67 4910.805 4472.906
## 1866  90709.95 24935.16 11577.33 6713.121 8762.823 11020.16 4337.970 3717.019
## 1867  84780.53 23652.79 10625.33 6335.044 8254.602 10666.19 4034.099 3295.475
## 1868  87869.02 23629.56 11312.84 6385.506 8104.054 10885.34 4190.971 3470.945
## 1869  97135.66 25332.61 13324.02 6868.413 8515.563 11517.75 4647.892 4258.929
## 1870 106498.66 27379.74 15079.49 7416.897 9083.858 12035.10 5049.074 4990.864
## 1871 107900.48 27660.49 15117.86 7582.375 9343.128 12056.09 5115.893 5026.607
## 1872 101772.80 26534.36 13686.03 7282.354 9199.678 11635.71 4838.235 4507.967
## 1873  92956.79 25188.73 11968.58 6799.965 8816.795 11159.25 4444.224 3862.752
## 1874  88184.71 24182.11 11060.76 6482.694 8443.581 10932.97 4224.240 3468.589
## 1875  89790.74 24054.66 11588.66 6536.826 8350.778 11047.80 4317.111 3619.738
## 1876  97038.83 25441.31 13213.73 6903.966 8599.392 11455.42 4626.726 4232.876
## 1877 103757.88 26958.06 14621.03 7296.035 8949.854 11851.60 4912.134 4781.751
## 1878 105461.25 27178.39 14758.46 7421.985 9146.326 11895.36 4974.956 4862.177
## 1879 100973.60 26373.09 13679.10 7213.295 9081.130 11621.89 4792.227 4493.471
## 1880  94507.60 25394.80 12223.42 6848.795 8830.306 11291.61 4516.783 3957.177
## 1881  90219.29 24507.56 11396.15 6603.213 8572.440 11086.47 4352.117 3618.644
## 1882  91495.06 24445.02 11803.48 6644.141 8503.957 11138.91 4404.114 3735.267
## 1883  96746.61 25484.96 13119.25 6919.727 8648.224 11439.24 4614.695 4203.719
## 1884 102247.20 26669.95 14299.65 7213.026 8879.942 11730.08 4819.892 4641.346
## 1885 103598.46 26856.27 14493.86 7311.363 9023.073 11785.50 4875.312 4739.820
##           WI2      WI3
## 1858 6176.895 4124.889
## 1859 5329.661 3627.467
## 1860 4939.751 3421.582
## 1861 5193.066 3409.778
## 1862 5949.951 3758.324
## 1863 6621.835 4239.140
## 1864 6608.905 4350.663
## 1865 6094.779 4081.890
## 1866 5516.201 3730.618
## 1867 5218.463 3484.130
## 1868 5401.211 3510.806
## 1869 5957.467 3818.709
## 1870 6419.620 4143.189
## 1871 6437.545 4224.592
## 1872 6067.164 4047.547
## 1873 5596.419 3757.607
## 1874 5336.946 3557.942
## 1875 5486.946 3596.572
## 1876 5936.103 3836.199
## 1877 6319.493 4079.028
## 1878 6358.597 4148.276
## 1879 6059.895 4010.090
## 1880 5644.681 3775.588
## 1881 5408.751 3620.209
## 1882 5532.740 3656.079
## 1883 5916.044 3844.747
## 1884 6256.086 4036.667
## 1885 6309.622 4092.182
plot(FORECAST_ALL_mo_arima, levels = 2, )

Indeed, this was the method that gave us the lowest MASE.

3 Forecast accuracy

This is the graph of the validation days with the model in traitillé comment me please

lowest <-aggts(FORECAST_ALL_bu_arima, levels = 4)

3.1 Modeling at level_0 with ARIMA

In the Exploratory Data Analysis, we could see that our data had different seasonality. The weekly seasonality was particularly marked. So we decided to create a time series with a frequency of 7 days.

In the next section, we will try to fit an Arima model in different ways on the top level (TX1).

3.1.1 Auto.arima without transformation

To start, we simply apply the auto.arima function to make an Arima automatically without any prior transformation. The approximation and stepwise parameters are set to “FALSE”.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(5,1,0)
## Q* = 437.2, df = 5, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 10

The result is an ARIMA(5,1,0). Residues are normally distributed and centered around zero but we can see that there is still some dependency (ACF).

I then carry out a forecast for the next 28 days and calculate the accuracy using the validation set (which corresponds to the same period).

28 days forcecast accuracy metrics
ME RMSE MAE MPE MAPE ACF1 Theil’s U
Test set 154.9081 499.8964 388.8319 2.665502 11.03722 0.283865 0.8823459

3.1.2 Auto.arima with transformation

3.1.2.1 Trial A

We could see in the previous test that our model did not capture all the information contained in the data. We will, therefore, carry out two new tests by performing transformations ourselves before applying the auto.arima function.

We will start by applying a differentiation with a lag 7.

We now have a time series centered around zero. We can also notice that we still have a visible seasonality on the time plot and more particularly on the ACF. This shows us a monthly seasonality (d = 30).

We will now apply an auto.arima and see if we get something better in terms of residues compared to our first trial.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,5) with zero mean
## Q* = 321.44, df = 5, p-value < 2.2e-16
## 
## Model df: 5.   Total lags used: 10

This time we have an ARIMA(5,0,1) which shows us that it did not apply any additional differentiation. For the residues, we still don’t get something satisfactory but the pattern is less clear than before. The Ljung-Box test confirms that the residues of our model do not come from a white noise.

We’ll see with the metrics if our hunch is confirmed.

28 days forcecast accuracy metrics
ME RMSE MAE MPE MAPE ACF1 Theil’s U
Test set 21.85714 417.5612 350.7143 100 100 0.4590362 0.9624683

Looking at the accuracy metrics we can see that the RMSE goes from 499.89 to 417.56. So we have an improvement. However, it is important to note here that we did not apply the inverse transformation after forecasting; we simply applied a differentiation on our validation set in order to obtain the accuracy (we also loose 7 days on the graph for the observed data). So, on the graph we have something very linear for the predictions. Normally, an inverse transoformation must be applied to obtain the final predictions.

3.1.2.2 Trial B

For this next test, we will take our initial time series, apply a differentiation with a lag of 30 and then make an arima.

This time, on the ACF, it is the weekly seasonality that appears. Which isn’t really a surprise.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(5,0,0) with non-zero mean
## Q* = 806.84, df = 4, p-value < 2.2e-16
## 
## Model df: 6.   Total lags used: 10

The auto.arima function gives us this time an ARIMA(5,0,0). Again, no further differentiation was applied. As in the previous tests, the Ljung-Box rejects H0 with a p-value of less than 0.05. This is also visible on the ACF.

28 days forcecast accuracy metrics
ME RMSE MAE MPE MAPE ACF1 Theil’s U
Test set 27.02445 437.1928 346.4124 47.35097 103.1009 0.5029452 1.0649

The RMSE is worse than in our previous test with a value of 437.19.

3.1.3 Auto.arima with transformation and regressor

We could see on different previous graphs that the store was closing at Christmas. Applying a 365 day differentiation would impact every point of our time series when we are just interested in that particular day. One solution is to use a regressor in our auto.arima function to take this information into account. This is what we will do in this section. We are also going to apply a 7 day differentiation since we have found that this gives us a better result.

28 days forcecast accuracy metrics
ME RMSE MAE MPE MAPE ACF1 Theil’s U
Test set 23.81543 417.3098 350.5692 101.6098 101.6098 0.4656876 0.9444874

By adding the regressor the model is only slightly improved compared to our previous best prediction. Indeed, we obtain an RMSE of 417.31 instead of 417.56.

As a conclusion to this report, we have defined two models that seem to us conclusive: an auto arima with a differentiation of 7 and event regressors, as well as …. both models are difficult to compare, because the auto arima with differentiation calculates the precision on a validation set which has also been subjected to differentiation. Thus, the MASE, which is our index to compare the performance of our models, is not provided for this model.

how could we have gone further in the project? First of all, we could have included more parameters for the regressor of our model, such as including all the events of the year. For our models, we only included Christmas, which has a considerable effect since it is the only day of the year when the store is closed and therefore sales are at zero, as well as Independence Day which is the only event that will appear in the 28 days that we have to forecast. Then we could have taken into account the monthly seasonality in all the models. But as mentioned before, since the number of days is variable for each month, we could not simply apply a 28 or 30 day differentiation. Finally, another way to improve our predictions would have been to take a lower layer of our data, i.e. to make predictions down to each particular product, and not stop at the second level (dept_id).